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FAST,  FLEXIBLE,  RATIONAL  INDUCTIVE  INFERENCE 


1  Research  issues 

Despite  great  advances  in  artificial  intelligence  (AI)  research  over  the  last  fifty  years,  computers  arc  still 
far  worse  than  people  at  solving  many  important  problems,  such  as  learning  language,  inferring  categories 
of  objects  from  just  a  few  examples,  and  identifying  causal  relationships.  The  goal  of  this  project  was  to 
develop  automated  systems  that  can  match  human  performance  in  problems  of  this  kind.  The  approach 
that  was  taken  to  achieving  this  goal  is  one  that  contributed  to  the  first  AI  systems:  identifying  the  formal 
principles  that  characterize  how  people  solve  these  problems.  This  required  combining  mathematical  tools 
from  computer  science  and  statistics  with  the  empirical  methods  of  cognitive  psychology.  By  exploiting  the 
interplay  between  these  disciplines,  the  resulting  research  provided  insight  into  how  we  can  make  machines 
learn,  and  a  deeper  understanding  of  how  the  human  mind  works. 

Part  of  the  challenge  of  machine  learning  research  is  that  it  requires  dealing  with  a  different  kind  of 
problem  from  traditional  AI.  Learning  language,  forming  categories,  and  identifying  causal  relationships 
arc  all  inductive  problems,  where  the  limited  data  available  leave  the  solution  radically  underdetermined,  as 
opposed  to  the  deductive  problems  such  as  mathematical  reasoning  and  game-playing  in  which  AI  research 
has  traditionally  been  successful.  Induction  has  a  bad  reputation  in  philosophy,  having  been  called  a  scandal, 
a  riddle,  and  a  myth,  and  one  reason  for  this  is  that  there  is  no  consensus  on  exactly  how  such  inferences 
should  be  formalized,  in  contrast  to  the  broadly  accepted  standard  of  deductive  logic.  By  studying  how 
people  solve  such  problems,  this  project  aimed  to  identify  some  of  the  formal  principles  that  characterize 
successful  inductive  inferences.  A  general  overview  of  these  ideas  appears  in  Tenenbaum,  Kemp,  Griffiths, 
and  Goodman  (201 1)  and  Griffiths,  Tenenbaum,  and  Kemp  (2012). 

Building  on  recent  work  in  both  AI  and  cognitive  science,  this  project  explored  the  possibility  that 
Bayesian  statistics  can  provide  formal  solutions  to  inductive  problems.  Bayesian  statistics  is  based  upon 
a  simple  principle  that  dictates  how  a  rational  agent  should  change  his  or  her  beliefs  in  light  of  evidence, 
called  Bayes’  rule.  Assume  that  a  learner  is  evaluating  a  set  of  hypotheses,  and  has  assigned  a  “prior” 
probability  P(h)  to  each  hypothesis  h  in  that  set.  Then,  Bayes’  rule  indicates  that  after  seeing  data  d,  the 
learner  should  assign  each  hypothesis  a  “posterior”  probability  P{h\d)  proportional  to  P(h)  multiplied  by 
the  probability  of  observing  d  if  h  were  true,  P{d\h).  Bayes’  rule  is  a  principled  way  to  combine  constraints 
on  hypotheses  from  prior  knowledge  with  the  evidence  provided  by  data,  and  motivates  much  contemporary 
research  in  statistical  artificial  intelligence  and  machine  learning. 

Despite  the  promise  of  Bayesian  inference  as  a  framework  for  studying  human  inductive  inference,  this 
approach  faces  two  serious  challenges.  First,  probabilistic  inference  is  extremely  computationally  intensive, 
particularly  with  the  large  numbers  of  complex  hypotheses  needed  to  model  realistic  human  performance. 
Second,  capturing  the  kinds  of  inferences  that  people  are  capable  of  making  requires  going  beyond  a  simple 
evaluation  of  a  fixed,  discrete  set  of  hypotheses,  and  considering  how  we  can  define  models  that  are  flexible, 
being  able  to  support  rich  hypothesis  spaces  that  can  adapt  to  accommodate  the  data.  The  research  supported 
by  this  grant  focused  on  addressing  these  two  challenges,  with  the  goal  of  producing  automated  systems 
capable  of  fast,  flexible,  rational  inductive  inference. 

The  research  was  divided  into  two  objectives,  each  addressed  by  three  different  lines  of  work.  The 
remainder  of  this  report  summarizes  the  results  of  these  lines  of  work.  Publications  resulting  from  the  work 
arc  cited  and  appeal-  in  the  reference  list.  These  publications  summarize  the  data  collected  as  part  of  the  grant 
and  describe  relevant  models  at  a  level  where  they  can  be  re-implemented. 

2  Objective  1:  Psychological  and  neural  mechanisms  for  fast  inductive  inference 

Despite  their  growing  popularity  in  cognitive  science,  probabilistic  models  of  cognition  are  often  criticized 
for  not  identifying  psychological  or  neural  mechanisms  that  might  support  the  computations  involved  in 
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Bayesian  inference.  Part  of  the  issue  stems  from  the  fact  that  probabilistic  models  provide  a  different  kind  of 
explanation  of  human  cognition  than  other  approaches,  answering  questions  about  the  abstract  computational 
problems  involved  in  cognition.  However,  answers  at  this  level  of  analysis  should  guide  investigation  at  the 
levels  of  algorithm  and  implementation,  and  one  of  the  challenges  for  the  probabilistic  approach  is  to  develop 
a  clear  account  of  how  the  underlying  computations  could  be  carried  out. 

Efficient  implementation  of  probabilistic  inference  is  not  just  a  problem  in  cognitive  science  -  it  is  an  is¬ 
sue  that  arises  in  computer  science  and  statistics,  resulting  in  a  number  of  promising  solutions.  The  proposed 
research  will  explore  whether  these  solutions  seem  to  correspond  to  the  way  that  people  make  inductive 
inferences,  and  whether  the  strategies  that  people  use  might  lead  to  new  approximation  schemes  that  can  be 
applied  in  machine  learning.  In  particular,  this  aspect  of  the  project  focused  on  on  the  potential  implications 
of  Monte  Carlo  algorithms,  which  approximate  a  probability  distribution  with  a  set  of  samples  from  that 
distribution.  Sophisticated  Monte  Carlo  schemes  provide  methods  for  recursively  updating  a  set  of  samples 
from  a  distribution  as  more  data  arc  obtained,  providing  an  answer  to  the  question  of  how  learners  with  finite 
memory  resources  might  be  able  to  maintain  a  distribution  over  a  large  hypothesis  space. 

The  research  funded  by  this  grant  explored  three  important  issues  inspired  by  this  idea:  how  well  Monte 
Carlo  methods  work  as  accounts  of  human  inductive  inference,  how  these  methods  connect  to  neural  compu¬ 
tation,  and  how  human  cognition  might  inspire  new  methods  for  approximating  probabilistic  inference.  An 
overview  of  these  ideas  appears  in  Griffiths,  Vul,  and  Sanborn  (2012). 

Objective  1.1:  Evaluating  Monte  Carlo  methods  as  rational  process  models 

Since  no  approximation  scheme  is  perfect,  we  can  study  whether  the  way  in  which  different  approximations 
behave  correspond  to  the  patterns  we  see  in  human  behavior.  For  example,  some  approximation  methods 
exhibit  order  effects  (particularly  primacy  effects),  something  that  has  been  identified  as  a  challenge  for 
Bayesian  models  of  cognition.  The  research  supported  by  this  grant  examined  whether  sophisticated  Monte 
Carlo  techniques  perform  in  a  way  that  is  similar  to  human  inference  across  a  range  of  inductive  problems. 

Our  research  examined  two  kinds  of  Monte  Carlo  algorithms:  particle  filters,  and  Markov  chain  Monte 
Carlo.  A  particle  filter  approximates  a  sequentially- updated  posterior  probability  distribution  with  a  set  of 
samples  from  that  distribution,  adjusting  the  samples  as  more  data  arc  observed  in  order  to  maintain  a  good 
approximation.  Abbott  and  Griffiths  (201 1)  showed  that  particle  filters  can  produce  both  primacy  and  recency 
effects  -  two  kinds  of  order  effects  that  emerge  in  human  causal  learning.  These  results  demonstrate  that  this 
kind  of  Monte  Carlo  method  might  provide  a  viable  explanation  for  order  effects  observed  for  other  aspects 
of  human  cognition,  addressing  one  of  the  main  empirical  challenges  to  Bayesian  models  of  cognition. 

Our  work  on  Markov  chain  Monte  Carlo  examined  the  contexts  where  a  random  walk  in  a  representa¬ 
tional  space  might  account  for  human  behavior.  Abbott,  Austerweil,  and  Griffiths  (2013)  showed  that  this 
simple  stochastic  mechanism  can  account  for  a  set  of  findings  in  the  literature  on  semantic  memory.  A  ran¬ 
dom  walk  on  a  semantic  network  reproduces  the  main  empirical  phenomena  of  semantic  fluency,  demonstrat¬ 
ing  a  “clustering”  of  semantically  related  items.  This  provides  a  far  simpler  explanation  for  these  phenomena 
than  previous  proposals. 

Lieder,  Griffiths,  and  Goodman  (2013)  took  the  idea  of  Markov  chain  Monte  Carlo  as  a  cognitive  process 
in  a  different  direction,  examining  the  question  of  how  long  a  rational  learner  should  run  a  simulation  before 
making  a  decision.  The  surprising  result  of  this  analysis  was  that  when  there  is  a  cost  for  the  time  spent 
waiting  for  an  answer,  learners  should  simulate  for  a  fairly  short  amount  of  time,  resulting  in  a  significant  bias 
in  responses.  This  bias  turns  out  to  be  the  same  as  that  found  in  experiments  on  “anchoring  and  adjustment”, 
providing  a  compelling  rational  account  of  a  phenomenon  that  has  typically  been  taken  as  evidence  of  human 
irrationality. 
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Objective  1.2:  Connecting  to  neural  computation 

Monte  Carlo  methods  also  provide  a  way  to  connect  Bayesian  inference  and  artificial  neural  networks,  pro¬ 
viding  a  picture  in  which  existing  connectionist  approaches  provides  a  substrate  for  Bayesian  inference. 
Abbott,  Hamrick,  and  Griffiths  (2013)  explored  this  connection,  demonstrating  that  one  of  the  Monte  Carlo 
algorithms  that  has  previously  been  connected  to  human  cognition  -  importance  sampling  -  can  be  imple¬ 
mented  in  a  standard  neural  network  model  used  to  capture  associative  memory.  This  is  an  important  result, 
as  it  is  the  first  demonstration  that  this  kind  of  probabilistic  inference  can  be  performed  using  distributed 
representations,  potentially  providing  an  avenue  for  performing  probabilistic  inference  over  structured  rep¬ 
resentations  in  a  fixed  neural  architecture. 

In  another  line  of  research.  Griffiths,  Austerweil,  and  Berthiaume  (2012)  connected  Bayesian  inference 
and  neural  network  models  at  a  more  abstract  level  of  analysis.  This  work  showed  that  simple  neural  net¬ 
works  can  be  characterized  as  performing  Bayesian  inference  with  a  particular-  prior  distribution,  providing 
a  formal  foundation  for  exploring  the  similarities  and  differences  between  these  approaches. 

Objective  1.3:  Developing  new  approximation  algorithms 

Studying  human  cognition  can  provide  clues  that  lead  to  new  approximation  algorithms.  For  example,  hu¬ 
man  minds  often  operate  under  more  extreme  constraints  than  most  statistical  algorithms.  In  this  spirit, 
we  developed  a  new  class  of  algorithms  for  probabilistic  inference  that  only  maintain  a  single  hypothesis 
in  memory  at  a  time,  and  only  take  a  single  pass  through  a  dataset  (Bonawitz,  Denison,  Chen,  Gopnik,  & 
Griffiths,  2011).  These  algorithms  are  based  on  the  “win-stay,  lose-shift”  principle,  sticking  (stochastically) 
with  a  hypothesis  as  long  as  it  explains  observed  data,  and  switching  away  when  it  does  not.  The  resulting 
algorithms  can  approximate  Bayesian  inference  as  well  as  a  single  sample  from  the  posterior  distribution.  A 
journal  article  presenting  a  series  of  experiments  comparing  these  algorithms  to  the  behavior  of  adults  and 
children  is  currently  under  review  (Bonawitz,  Denison,  Gopnik,  &  Griffiths,  under  review). 

2.1  Objective  2:  Using  nonparametric  Bayesian  methods  to  define  flexible  probabilistic  models 

Human  inductive  inferences  are  characterized  by  flexibility:  people  are  able  to  learn  about  the  world  around 
them  without  having  rigid  constraints  placed  on  the  hypotheses  they  consider.  For  example,  we  would  not 
assume  that  there  are  just  three  types  of  things  in  the  world,  that  objects  can  only  possess  a  finite  number 
of  observable  features,  or  that  causal  relationships  have  to  be  linear.  However,  this  kind  of  assumption 
is  common  in  statistical  models,  where  constraints  are  placed  on  the  number  of  components  in  a  mixture 
model,  the  dimensionality  of  a  latent  space,  or  the  functional  form  of  a  dependency  between  two  variables. 

Recent  work  in  nonparametric  Bayesian  statistics  has  begun  to  make  it  possible  to  define  probabilistic 
models  that  consider  an  infinite  number  of  structured  hypotheses.  These  models  include  components  that  are 
based  on  complex  stochastic  processes,  such  as  the  Dirichlet  process  or  Gaussian  processes,  which  define 
distributions  appropriate  for  nonparametric  problems.  For  example,  the  Dirichlet  process  defines  a  distri¬ 
bution  on  a  discrete  but  uncountable  set  of  atoms.  This  makes  it  useful  as  a  prior  in  an  “infinite”  mixture 
model,  where  each  of  these  atoms  corresponds  to  the  parameters  of  a  mixture  component.  The  resulting 
model  can  identify  as  many  clusters  as  needed  in  order  to  capture  the  structure  of  observed  data,  rather  than 
being  limited  to  some  finite  number  a  priori. 

The  research  supported  by  this  grant  explored  connections  between  nonparametric  Bayesian  statistics 
and  human  cognition,  focusing  on  whether  this  approach  can  result  in  the  kind  of  flexibility  that  is  needed  to 
model  human  inferences.  There  were  three  lines  of  research:  one  looked  at  learning  systems  of  categories, 
one  focused  on  inferring  the  features  of  objects,  and  one  considered  how  to  capture  the  properties  of  complex 
systems  involving  continuous  quantities. 
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Objective  2.1:  Learning  systems  of  categories 

Most  models  of  human  category  learning  focus  on  learning  a  small  number  of  unrelated  categories,  but  most 
actual  category  learning  involves  learning  a  large  number  of  related  categories.  For  example,  we  have  to 
simultaneously  learn  categories  like  “dog”,  “pet”,  and  “mammal”.  Canini  and  Griffiths  (2011)  extended  an 
existing  nonparametric  Bayesian  framework  -  hierarchical  Dirichlet  processes  -  to  address  learning  category 
structures  of  this  kind,  making  it  possible  to  infer  the  structure  of  a  taxonomy  relating  a  set  of  categories  as 
well  as  the  categories  themselves.  This  approach  resulted  in  a  powerful  tool  for  inferring  category  represen¬ 
tations  in  any  taxonomically  organized  domain,  and  produced  results  consistent  with  human  behavior. 

Feldman,  Griffiths,  Goldwater,  and  Morgan  (in  press)  explored  a  different  way  in  which  categories  can 
be  related  -  through  higher-level  constraints  that  govern  which  categories  can  appeal-  together.  Specifically, 
when  learning  phonetic  categories  -  the  sounds  that  make  up  speech  -  learning  the  words  that  those  sounds 
appeal-  in  provides  sufficiently  strong  constraints  that  it  converts  the  problem  from  one  that  is  almost  impos¬ 
sible  to  solve  into  one  that  can  be  solved  surprisingly  well.  The  resulting  system  is  the  first  to  be  able  to  infer 
realistic  phonetic  categories  directly  from  simulated  speech  data. 

Objective  2.2:  Forming  feature-based  representations 

Nonparametric  Bayesian  models  can  also  be  used  to  infer  the  number  and  identity  of  the  features  that  charac¬ 
terize  a  set  of  objects.  Griffiths  and  Ghahramani  (2011)  summarized  the  key  ideas  behind  a  novel  approach  to 
solving  this  problem,  based  on  a  stochastic  process  known  as  the  Indian  buffet  process.  Austerweil  and  Grif¬ 
fiths  (201 1)  showed  that  this  approach  produces  results  that  are  consistent  with  human  feature  learning,  and 
Austerweil  and  Griffiths  (2010)  extended  this  approach  to  learn  features  that  are  invariant  to  a  particular  set 
of  transformations.  The  results  of  this  work  are  summarized  in  a  long  article  forthcoming  in  Psychological 
Review  (Austerweil  &  Griffiths,  in  press). 

Objective  2.3:  Representing  objects  with  continuous  dimensions 

Developing  flexible  statistical  models  that  can  represent  objects  with  continuous  dimensions  remains  a  chal¬ 
lenge.  The  work  supported  by  this  grant  took  on  this  challenge  for  two  kinds  of  representations:  images  and 
causal  relationships.  The  work  on  images  used  Bayesian  methods  to  identify  the  relevant  reference  frame 
and  orientation  of  elements  of  a  scene  (Austerweil,  Friesen,  &  Griffiths,  2011),  the  most  representative  im¬ 
age  in  a  collection  (Abbott,  Heller,  Ghahramani,  &  Griffiths,  2011),  and  the  actions  that  appeal-  in  a  video 
stream  (Buchsbaum,  Canini,  &  Griffiths,  2011).  In  each  of  these  cases,  a  flexible  Bayesian  model  produced 
predictions  that  were  closely  aligned  with  human  performance. 

Our  work  on  causal  systems  considered  how  existing  research  on  Bayesian  models  of  causal  relationships 
could  be  extended  to  capture  the  continuous  elements  of  these  relationships.  To  this  end,  we  considered  how 
to  define  appropriate  prior  distributions  on  the  strength  of  causal  relationships  (Yeung  &  Griffiths,  2011),  how 
people  infer  causal  relationships  expressed  in  continuous  time  (Pacer  &  Griffiths,  2012),  and  how  Bayesian 
inference  can  be  used  to  explain  a  set  of  results  in  perceptual  causality  that  have  traditionally  been  taken  as 
supporting  the  use  of  error-prone  heuristics  (Sanborn,  Mansinghka,  &  Griffiths,  in  press).  The  results  of  this 
research  are  a  set  of  models  of  human  causal  reasoning  that  can  directly  inform  work  on  causality  in  machine 
learning. 

3  Impact 

The  potential  impact  of  the  proposed  research  lies  in  two  areas,  corresponding  to  the  two  disciplines  that  will 
be  brought  together  by  this  project:  psychology  and  computer  science.  On  the  psychology  side,  the  models 
we  have  developed  provide  greater  insight  into  the  factors  that  are  involved  when  people  solve  inductive 
problems,  allowing  us  to  identify  methods  for  more  efficient  training  and  ways  of  dealing  with  cases  where 
the  approximations  that  people  use  lead  them  to  make  errors.  This  work  has  already  been  very  influential 
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and  generated  a  number  of  citations  -  the  review  article  by  Tenenbaum  et  al.  (2011)  already  has  149  citations 
in  Google  Scholar,  despite  having  come  out  in  201 1. 

On  the  computer  science  side,  developing  a  deeper  understanding  of  the  formal  principles  that  underlie 
human  inductive  inference  will  pave  the  way  towards  building  automated  systems  that  match  and  ultimately 
exceed  human  performance.  In  particular,  understanding  how  people  efficiently  make  inferences  over  rich 
(and  potentially  unbounded)  hypothesis  spaces  will  be  a  first  step  towards  making  automated  systems  that 
arc  capable  of  fast,  flexible,  rational  inductive  inference.  Our  work  has  already  had  an  impact  on  computer 
science,  encouraging  the  development  of  more  sophisticated  systems  for  computer  vision  that  are  capable  of 
automatically  identifying  the  features  of  images. 
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